Goto

Collaborating Authors

 drl model



a35fe7f7fe8217b4369a0af4244d1fca-Paper.pdf

Neural Information Processing Systems

Despite their promising performance, the learned knowledge remains implicit in these black-box neural structures, which hinders understanding the importance of input features and how they influencedecisions.


(DEMO) Deep Reinforcement Learning Based Resource Allocation in Distributed IoT Systems

Li, Aohan, Tsuzuki, Miyu

arXiv.org Artificial Intelligence

Abstract--Deep Reinforcement Learning (DRL) has emerged as an efficient approach to resource allocation due to its strong capability in handling complex decision-making tasks. However, only limited research has explored the training of DRL models with real-world data in practical, distributed Internet of Things (IoT) systems. T o bridge this gap, this paper proposes a novel framework for training DRL models in real-world distributed IoT environments. In the proposed framework, IoT devices select communication channels using a DRL-based method, while the DRL model is trained with feedback information--specifically, Acknowledgment (ACK) information--obtained from actual data transmissions over the selected channels. Implementation and performance evaluation, in terms of Frame Success Rate (FSR), are carried out, demonstrating both the feasibility and the effectiveness of the proposed framework. In recent years, the number of Internet of Things (IoT) devices has grown rapidly, driven by advancements in communication technologies such as LoRa, Sigfox, and NB-IoT, the declining cost of sensors and embedded systems, and the application of artificial intelligence in data processing.




Algorithmic Control Improves Residential Building Energy and EV Management when PV Capacity is High but Battery Capacity is Low

Ullner, Lennart, Zharova, Alona, Creutzig, Felix

arXiv.org Artificial Intelligence

Efficient energy management in prosumer households is key to alleviating grid stress in an energy transition marked by electric vehicles (EV), renewable energies and battery storage. However, it is unclear how households optimize prosumer EV charging. Here we study real-world data from 90 households on fixed-rate electricity tariffs in German-speaking countries to investigate the potential of Deep Reinforcement Learning (DRL) and other control approaches (Rule-Based, Model Predictive Control) to manage the dynamic and uncertain environment of Home Energy Management (HEM) and optimize household charging patterns. The DRL agent efficiently aligns charging of EV and battery storage with photovoltaic (PV) surplus. We find that frequent EV charging transactions, early EV connections and PV surplus increase optimization potential. A detailed analysis of nine households (1 hour resolution, 1 year) demonstrates that high battery capacity facilitates self optimization; in this case further algorithmic control shows little value. In cases with relatively low battery capacity, algorithmic control with DRL improves energy management and cost savings by a relevant margin. This result is further corroborated by our simulation of a synthetic household. We conclude that prosumer households with optimization potential would profit from DRL, thus benefiting also the full electricity system and its decarbonization.


Deep Reinforcement Learning for Long-Short Portfolio Optimization

Huang, Gang, Zhou, Xiaohua, Song, Qingyang

arXiv.org Artificial Intelligence

With the rapid development of artificial intelligence, data-driven methods effectively overcome limitations in traditional portfolio optimization. Conventional models primarily employ long-only mechanisms, excluding highly correlated assets to diversify risk. However, incorporating short-selling enables low-risk arbitrage through hedging correlated assets. This paper constructs a Deep Reinforcement Learning (DRL) portfolio management framework with short-selling mechanisms conforming to actual trading rules, exploring strategies for excess returns in China's A-share market. Key innovations include: (1) Development of a comprehensive short-selling mechanism in continuous trading that accounts for dynamic evolution of transactions across time periods; (2) Design of a long-short optimization framework integrating deep neural networks for processing multi-dimensional financial time series with mean Sharpe ratio reward functions. Empirical results show the DRL model with short-selling demonstrates significant optimization capabilities, achieving consistent positive returns during backtesting periods. Compared to traditional approaches, this model delivers superior risk-adjusted returns while reducing maximum drawdown. From an allocation perspective, the DRL model establishes a robust investment style, enhancing defensive capabilities through strategic avoidance of underperforming assets and balanced capital allocation. This research contributes to portfolio theory while providing novel methodologies for quantitative investment practice.


NTP-INT: Network Traffic Prediction-Driven In-band Network Telemetry for High-load Switches

Zhang, Penghui, Zhang, Hua, Dai, Yuqi, Zeng, Cheng, Wang, Jingyu, Liao, Jianxin

arXiv.org Artificial Intelligence

In-band network telemetry (INT) is essential to network management due to its real-time visibility. However, because of the rapid increase in network devices and services, it has become crucial to have targeted access to detailed network information in a dynamic network environment. This paper proposes an intelligent network telemetry system called NTP-INT to obtain more fine-grained network information on high-load switches. Specifically, NTP-INT consists of three modules: network traffic prediction module, network pruning module, and probe path planning module. Firstly, the network traffic prediction module adopts a Multi-Temporal Graph Neural Network (MTGNN) to predict future network traffic and identify high-load switches. Then, we design the network pruning algorithm to generate a subnetwork covering all high-load switches to reduce the complexity of probe path planning. Finally, the probe path planning module uses an attention-mechanism-based deep reinforcement learning (DEL) model to plan efficient probe paths in the network slice. The experimental results demonstrate that NTP-INT can acquire more precise network information on high-load switches while decreasing the control overhead by 50\%.


Dynamic Portfolio Optimization via Augmented DDPG with Quantum Price Levels-Based Trading Strategy

Lin, Runsheng, Xing, Zihan, Ma, Mingze, Lee, Raymond S. T.

arXiv.org Artificial Intelligence

With the development of deep learning, Dynamic Portfolio Optimization (DPO) problem has received a lot of attention in recent years, not only in the field of finance but also in the field of deep learning. Some advanced research in recent years has proposed the application of Deep Reinforcement Learning (DRL) to the DPO problem, which demonstrated to be more advantageous than supervised learning in solving the DPO problem. However, there are still certain unsolved issues: 1) DRL algorithms usually have the problems of slow learning speed and high sample complexity, which is especially problematic when dealing with complex financial data. 2) researchers use DRL simply for the purpose of obtaining high returns, but pay little attention to the problem of risk control and trading strategy, which will affect the stability of model returns. In order to address these issues, in this study we revamped the intrinsic structure of the model based on the Deep Deterministic Policy Gradient (DDPG) and proposed the Augmented DDPG model. Besides, we also proposed an innovative risk control strategy based on Quantum Price Levels (QPLs) derived from Quantum Finance Theory (QFT). Our experimental results revealed that our model has better profitability as well as risk control ability with less sample complexity in the DPO problem compared to the baseline models.


Advanced Persistent Threats (APT) Attribution Using Deep Reinforcement Learning

Basnet, Animesh Singh, Ghanem, Mohamed Chahine, Dunsin, Dipo, Sowinski-Mydlarz, Wiktor

arXiv.org Artificial Intelligence

The development of the DRL model for malware attribution involved extensive research, iterative coding, and numerous adjustments based on the insights gathered from predecessor models and contemporary research papers. This preparatory work was essential to establish a robust foundation for the model, ensuring it could adapt and respond effectively to the dynamic nature of malware threats. Initially, the model struggled with low accuracy levels, but through persistent adjustments to its architecture and learning algorithms, accuracy improved dramatically from about 7 percent to over 73 percent in early iterations. By the end of the training, the model consistently reached accuracy levels near 98 percent, demonstrating its strong capability to accurately recognise and attribute malware activities. This upward trajectory in training accuracy is graphically represented in the Figure, which vividly illustrates the model maturation and increasing proficiency over time.